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Abstract 

We  introduce  a  protocol  for  authentication  between  a  human  and  a  computer,  where  the  human  is 
able  to  use  no  special  hardware  other  than  a  dumb  terminal.  Authentication  is  based  on  a  shared 
secret  which  can  be  reused  polynomially  often  with  no  danger  of  exposure,  assuming  the  conjectured 
uniform  hardness  of  learning  parity  functions  in  the  presence  of  noise.  Under  this  conjecture,  the 
protocol  is  secure  against  a  polynomially-bounded  passive  adversary  and  also  some  forms  of  active 
adversary,  although  it  is  not  secure  against  arbitrary  active  adversaries. 
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1  Introduction 


Consider  the  common  scenario  of  a  human  user  attempting  to  authenticate  to  a  (possibly)  remote 
computer  over  an  insecure  connection.  The  traditional  password  approach  is  unacceptable,  since  a 
network  snoop  can  record  the  password  and  will  then  be  able  to  falsely  authenticate  as  the  user  at 
will.  Zero-knowledge  schemes  such  as  Fiat-Shamir  [3]  require  trusted  hardware  which  can  be  stolen 
or  compromised.  One-time  passwords  [5]  are  just  that  -  good  for  only  a  single  authentication; 
pads  of  such  passwords  are  vulnerable  to  theft  and  still  require  a  large  ratio  of  “key  material”  to 
authentication. 

An  alternative  is  a  challenge-response  protocol  in  which  the  . user  and  computer  have  a  shared 
secret  that  the  user  can  use  to  respond  to  the  computer’s  challenges  in  such  a  way  that  an  adversary 
cannot  easily  learn  the  secret.  Papers  by  Matsumoto  and  Imai  [7],  Wang  et  al  [9],  and  Matsumoto 
[6]  provide  schemes  which  are  sufficient  for  a  small  number  of  authentications  but  are  generally 
vulnerable  to  an  active  adversary  or  require  the  user  to  remember  a  large  secret  or  perform  many 
calculations  to  achieve  a  large  number  of  secure  authentications. 

Ideally,  we  would  like  to  have  a  scheme  which  allows  the  user  to  remember  a  moderately  sized 
secret  (of  size,  say,  poly{[ogn))  and  remains  secure  against  an  adversary  even  after  many  authen¬ 
tications  (say,  any  poly['n)  authentications).  In  this  paper,  we  describe  a  system  based  on  the 
problem  of  learning  parity  functions  in  the  presence  of  noise,  which  achieves  these  objectives,  if  the 
underlying  problem  is  uniformly  hard. 

2  Learning  Parity  in  the  Presence  of  Noise 

Suppose  the  secret  shared  between  the  human  and  the  computer  is  a  vector  x  of  length  n  over 
GF(2),  where  |x|  =  log  w,  that  is,  x  has  log  n  positions  set  to  1,  and  the  rest  set  to  0.  Authentication 
proceeds  as  follows:  The  computer,  C,  generates  a  random  n- vector  c  over  GF{2)  and  sends  it  to  the 
human,  77,  as  a  challenge.  H  responds  with  the  bit  r  c*x,  the  dot-product  over  GF{2),  C  aecepts 
if  r  =  c*x.  Clearly  on  a  single  authentication,  C  accepts  a  legitimate  user  H  with  probability  1,  and 
an  impostor  with  probability  iteration  k  times  results  in  accepting  an  impostor  with  probability 
Unfortunately,  after  observing  n  challenge-response  pairs  between  C  and  77,  the  adversary  M 
can  use  Gaussian  elimination  to  discover  the  secret  x  and  masquerade  as  77. 

Suppose  we  introduce  a  parameter  p  G  (0,  |)  and  allow  77  to  respond  incorrectly  with  probability 
7/;  in  that  case  the  adversary  can  no  longer  simply  use  Gaussian  elimination  to  learn  the  secret  x. 
This  is  an  instance  of  the  problem  of  learning  parity  in  the  presence  of  noise  (LPN).  In  fact  the 
problem  of  learning  x  becomes  NP-Hard  in  the  presence  of  errors;  in  fact  it  is  NP-Hard  to  even  find 
an  X  satisfying  more  than  half  of  the  challenge-response  pairs  collected  by  M  [4].  Of  course,  the 
hardness  results  of  Hastad  [4]  simply  imply  that  there  exist  instances  of  this  problem  which  cannot 
be  solved  in  polynomial  time  unless  P=:NP;  it  is  still  possible  that  the  problem  is  tractable  in  the 
random  case.  However,  the  best  known  algorithm  for  the  general  random  problem,  due  to  Blum, 
Kalai  and  Wasserman,  requires  challenge-response  pairs  and  works  in  time 

here  we  will  give  some  evidence  that  this  problem  is,  in  fact,  uniformly  hard  and  cannot  be  learned 
in  time  and  sample  size  poly{n^  l/(^  —  ?;)). 

In  the  following,  we  will  refer  to  an  instance  of  LPN  as  a  m  x  n  matrix  A  (where  m  —  poly{n)); 
a  m-vector  b,  and  a  noise  parameter  r/;  the  problem  is  to  find  a  n-vector  x  such  that  |Ax“b|  <  r/m. 

Lemma  1  (Pseudo-randomizability) 

Any  instance  of  LPN  can  be  transformed  in  polynomial  time  into  an  instance  chosen  uniformly  at 
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random  from  a  space  of  possibilif  ics. 

Proof:  Choose  the  n  x  ??  matrix  R  Gf/  Then  if  there  is  a  solution  to  the  instance 

(AR,  b,  77),  say  y,  then  we  have: 

|(AR)y-b| 

and  if  we  let  x  :—  Ry  we  find  that  Ax  —  A(Ry)  =  (AR)y,  whicli  yields  the  desired  x,  since: 

|Ax  -  b|  =  |(AR)y  -  b|  <  7]m. 

Thus  there  is  a  polynomial-time  transformation  between  adversarial  instances  and  pseudo-random 
instances,  such  that  a  solution  to  the  pseudo-random  instance  can  be  transformed  into  a  solution 
to  the  adversarial  instance. 

Lemma  2  (Log- Uniformity) 

If  a  random  instance  {A,h,i])  of  LPN  can  be  solved  in  time  pol}j(]K\og{l/{r^  —  ?/))),  than  any 
instance  can  be  solved  in  time  poiy(nAog(l /{^  —  v)))‘ 

Proof:  Let  e{i])  =  2  ”  ^  algorithm  which  solves  random  instances  in  time 

poly{nAog{l/e{7]))).  Let  (A,b,  ?/)  be  an  adversarial  instance  of  LPN.  Ch'eate  the  new  instance 
(A',  b',  ?/)  as  follows: 

•  For  each  row  of  A,  randomly  choose  n  other  rows  of  A  and  use  the  sum  of  these  rows  as  the 
corresponding  entry  in  A' 

•  Fill  in  the  corresponding  entry  in  b'  by  adding  the  corresponding  rows  of  b. 

Given  the  error  rate  77  in  the  initial  instance,  the  error  rate  77'  is  correct,  by  the  following  lemma 
(due  to  Blum,  Kalai,  and  Wasserman): 

Lemma  3  Let  (ai,  61), . . . ,  (as,  bg)  be  samples  from  (A,  b,  77).*  then  61  +  . . .  +  b^  is  the  coiirct  label 
for  «!  +  ...  +  «.s  tfdth  pivbability  ^  +  |(1  “  2^/)^- 

The  proof  follows  by  induction  on  s  [2],  The  resulting  instance  is  distributed  uniformly;  thus  A 
solves  it  in  time  770/7/(7?,  log(l/c(77'))).  But  note  that: 

((1/)  =  ^(1-2,/)"+’ 

so  that  770/7/(7?,  log(l/6(?7')))  =  770/77(7?,  71  log(l/c(77)))  —  770/7/(7?,  log(l/c(77)));  thus  A  solves  adversar¬ 
ial  instances  in  time  770/7/(7?,  log(l/c(77))). 

Conjecture  1  (Hardness  of  LPN) 

LPN  is  uniformly  hard  in  71  and  77;  if  there  is  an  algorithm  to  seAve  a  I'andom  instance  in  time 
poly[n^  l/{\  —  77)),  then  any  instance  can  be  solved  in  time  770/7/(7?,  l/(b  —  77)). 
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Evidence: 

•  The  best  known  algorithm  for  the  random  case,  given  by  Blum,  Kalai,  and  Wasserman,  has 
exponential  complexity. 

•  Lemmas  1  and  2 

This  assumption  is  not  unprecedented;  the  McEliece  public-key  cryptosystem  [8]  relies  on  a 
related  assumption,  and  the  pseudo-random  generator  proposed  by  Blum,  Furst,  Kearns  and  Lipton 
[1]  is  secure  under  a  very  similar  assumption. 

3  The  Simple  Protocol 

This  discussion  gives  rise  to  the  Protocol  1  below;  intuitively,  C  generates  the  coefRcient  matrix  of 
some  LPN  instance  while  H  generates  the  output  vector  and  some  errors.  Thus  after  a  number  of 
repetitions  C  can  be  reasonably  sure  that  H  knows  the  shared  secret  vector  x. 

Protocol  1 

1.  C  sets  i  :=  0 

2.  Repeat  k  times: 

(a)  C  selects  a  random  challenge  c  {0, 1}”  and  sends  it  to  H 

(b)  With  probability  l  —  p^H  responds  with  r  :=  c*x,  otherwise  H  responds  ivith  r  1  — c-x. 

(c)  c  •  X,  C  increments  i, 

3.  if  i  >  (1  -  T])k^  C  accepts  H. 

Theorem  1  If  H  guesses  random,  responses  r,  C  will  accept  H  with  probability  at  most 

(|)‘  E 

Proof:  Let  X  be  the  random  variable  denoting  the  number  of  times  H  guesses  correctly;  since  this 
probability  is  at  most  the  probability  of  guessing  correctly  exactly  i  times  out  of  k  is  (f)  (^)^; 
the  first  result  follows  from  summing  the  probabilities  of  guessing  correctly  {l  —  r])k  or  more  times; 
the  second  result  follows  by  a  Chernoff  bound  with  cq  =  (3  -  2ri)‘^lQ, 

Theorem  2  If  LPN  is  hard,  then  Protocol  1  is  secure  against  a  passive  adversary,  for  any  polyno¬ 
mial  number  of  authentications. 

Proof:  Obvious.  Since  a  passive  adversary  can  only  observe  challenge-response  pairs  (c,r),  ob¬ 
taining  the  secret  x  can  only  be  accomplished  via  solving  the  LPN  problem.  Unfortunately,  this 
protocol  is  not  secure  against  an  active  adversary:  suppose  M  can  insert  arbitrary  challenges  into 
the  interaction;  then  M  can  record  n/k{l  —  successful  authentications  and  replay  them  back  to 
H,  discarding  (c,r)  pairs  which  do  not  match;  the  remaining  pairs  will  have  no  errors  and  can  be 
solved  by  Gaussian  elimination. 

There  is  also  one  other  potential  problem  in  this  protocol:  can  the  human  H  be  counted  on 
to  make  “random  enough”  errors,  that  is,  to  make  errors  in  an  unpredictable  pattern?  Since  no 
hardware  is  allowed,  not  even  using  dice  will  be  satisfactory.  We  are  currently  engaged  in  empirical 
research  to  determine  whether  a  human  can  (or  will)  produce  unpredictable  errors. 
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4  A  More  Secure  Protocol 


Protocol  2  describes  a  more  complicated  protocol  which  retains  the  security  features  of  Protocol  1 
but  makes  fewer  computational  requirements  on  //.  Intuitively,  C  and  H  now  share  two  secrets: 
One  secret  allows  H  to  compute  whether  or  not  an  error  should  be  made  in  using  the  second 
secret,  and  also  how  to  make  that  error.  Thus  the  requirement  of  un])redictability  is  placed  on  the 
computer  rather  than  the  human,  and  the  number  of  rounds  necessary  for  a  given  security  level  is 
reduced. 

The  protocol  is  given  below.  As  before,  H  and  C  will  share  secret  binary  vectors  of  weight, 
log  n;  however  now  H  and  C  will  share  two  binary  vectors  x  and  y,  and  a  binary  vector  z  of  weight, 

1.  The  protocol  follows. 

Protocol  2 

1.  C  sets  i  0 

2.  Repeat  k  times: 

(a)  C  generates  a  random  ehal tenge  e  G/?  {0, 

(b)  With  probability  1  —  ?/,  C  modifies  c  so  that  c  •  y  =  c  •  z,  and  sets  a  :=  c  •  x 

(c)  Otherwise  C  modifies  c  so  that  c  -  y  ^  c  •  z,  and  sets  r/  —  c  •  x. 

(d.)  C  sends  c  to  H , 

(e)  H  checks  //  c  •  y  =  c  •  z;  if  so,  H  responds  with  r  :=  c  •  x.  otherwise  H  responds  with 
r  1  —  c  •  X. 

(f)  C  checks  if  a  =  r  and  increments  i  if  true, 

3.  C  accepts  H  if  i  =  k. 

Theorem  3  If  H  guesses  random  responses  r,  C  accepts  H  with  probability  at  most 
Proof:  if’s  guesses  can  be  correct  with  probability  at  most  The  result  is  obvious. 

Theorem  4  (Passive  adversary) 

If  Conjecture  1  is  true.  Protocol  2  is  secure  against  any  polynomially-bounded  passive  adversary  for 
any  polynomial  number  of  authentications. 

Proof:  Assume  that  LPN  is  hard;  then  we  wish  to  show  that  any  polynomial-time  algorithm  A 
recovering  the  secrets  x,  or  y  is  also  an  algorithm  to  solve  LPN.  There  are  three  cases: 

1.  A  ignores  y  and  attempts  to  recover  x;  in  that  case  using  the  rows  of  the  LPN  instance 
as  challenges  and  the  corresponding  parity  bits  as  responses  produces  a  transcript  with  the 
secret  the  same  as  the  parity  vector  souglit. 

2.  A  ignores  x  and  attempts  to  recover  y;  in  that  case  we  can  produce  a  transcript  by  appending 
the  result  column  of  an  LPN  instance  to  the  matrix;  the  recovered  y  will  be  the  x  from  the 
original  LPN  instance. 

3.  A  attempts  to  recover  x  and  y  simultaneously.  In  this  case,  we  create  a  transcript  from  an 
LPN  instance  (A,b,?/)  by  appending  b  to  A  and  also  using  b  as  the  sequence  of  responses. 
The  recovered  vectors  will  both  be  the  x  for  the  LPN  instance. 
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Since  LPN  is  NP-Hard  and  Conjecture  1  states  that  random  instances  are  equivalent  to  adversarial 
instances,  no  passive  adversary  can  recover  the  secrets  x  and  y  in  polynomial  time  unless  P  =  NP, 
given  the  assumption  that  LPN  is  uniformly  hard. 

There  are  several  advantages  of  Protocol  2  over  Protocol  1.  First,  for  a  given  security  parameter 
and  noise  rate  the  second  protocol  requires  fewer  iterations.  Protocol  2  is  also  secure  against  the 
replay  attack  mentioned  previously,  and  does  not  rely  on  H  to  produce  cryptographically  strong 
randomness.  On  the  other  hand,  the  protocol  is  still  not  secure  against  an  active  adversary  who 
can  submit  any  challenge;  it  is  an  open  question  whether  any  protocol  exists  which  a  human  can 
execute  that  is  secure  against  a  polynomially  bounded  active  adversary  for  polynomially  many 
authentications. 

5  A  Concrete  Protocol 

A  human  executing  this  protocol  would  most  likely  conceptualize  the  vectors  x,  y  and  z  as  sets  of 
log  77,  log  77-,  and  1  challenge  locations,  respectively;  a  reasonable  question  is:  to  what  extent  can 
X  and  y  overlap  (reducing  the  memory  requirement  on  the  human)  without  loss  of  security?  We 
conjecture  that  the  size  of  the  intersection  may  be  as  much  as  |  log  n  with  no  significant  loss  of 
security,  but  that  for  every  element  in  the  intersection  above  that  threshold,  the  search  complexity 
for  either  x  or  y  is  reduced  by  a  factor  of  77.  Thus  this  protocol  has  a  secret  size  of  flogn  +  1 
locations;  since  describing  each  location  requires  log  77  bits,  the  secret  is  O(log^n)  bits  in  length. 

Alternatively,  note  that  knowledge  of  one  of  x  and  y  immediately  gives  knowledge  of  the  other; 
thus  while  x  and  y  should  not  be  the  same  set  of  locations,  it  would  appear  that  the  security 
of  the  scheme  would  not  be  threatened  by  defining  one  as  a  simple  transformation  of  the  other, 
say  yi  =  Xi^i  modn-  Thus  H  need  only  remember  log  77+  1  locations  to  achieve  the  same  level  of 
security. 

As  an  additional  consideration  for  the  human  user,  the  challenges  c  could  be  selected  from 
{0,  and  the  arithmetic  done  modlO,  a  natural  base  for  many  humans.  This  reduces  the 

number  of  iterations  necessary  for  a  given  security  level  by  a  constant  factor.  It  also  requires 
modifying  the  method  of  making  an  error:  when  c  •  y  ^  c  •  z  (mod  10),  set  the  correct  answer 
a  :=  c  •  y  mod  10  rather  than  1  —  c  •  x;  this  construction  apparently  preserves  the  security  of 
Protocol  2  while  decreasing  the  computational  load  on  the  human. 

Finally,  we  select  concrete  parameters  to  this  protocol  which  we  believe  will  give  adequate 
security.  We  recommend  using  the  above  modifications  and  Protocol  2  with  n  >  128,  and  using 
12  locations  each  for  x  and  y.  Thus  the  user  will  need  to  remember  19  locations  in  a  128-element 
challenge;  this  will  correspond  to  a  security  factor  of  2^*^^  =  2^^.  For  n  ~  1000,  the  same  secret  size 
provides  a  security  factor  of  =  2^^^  but  the  conceptual  difficulty  of  searching  a  1000-element 

challenge  may  be  too  difficult  for  many  humans.  We  also  recommend  using  /c  >  6,  making  an 
adversary’s  probability  of  randomly  guessing  a  correct  sequence  of  responses  at  most  10“^.  Finally, 
the  error  rate  rj  should  be  between  0.2  and  0.8;  it  can  be  tuned  to  optimize  the  expected  number 
of  mod  10  additions  H  must  perform. 

6  Conclusions 

We  have  given  a  protocol  which  we  believe  a  human  can  execute,  having  relatively  small  secret  size 
and  an  adequate  security  margin  against  a  passive  adversary  and  some  forms  of  active  adversaries. 
Unfortunately,  this  protocol  is  not  secure  against  an  active  adversary  in  the  sense  that  it  can  be 
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compromised  in  less  than  poly{ii)  steps  for  some  polynomial  by  inserting  cleverly  chosen  challenges. 
One  important  question  is  whether  a  protocol  exists  that  is  secui’e  against  such  an  adversary  and  can 
be  executed  by  a  human.  Another  question  is  whether  the  presented  protocols  can  be  generalized 
in  a  way  not  requiring  extensive  numerical  computation  on  the  part  of  the  human  user. 


References 

[1]  Avrim  Blum,  Merrick  Furst,  Michael  Kearns,  and  Richard  J.  Lipton.  Cryptographic  primitives 
based  on  hard  learning  problems.  In  Douglas  R.  Stinson,  editor,  Advances  in  Cryptology — 
CRYPTO  ^93,  volume  773  of  Leefurr  Notes  in  Computer  Science,  pages  278-291.  Springer- 
Verlag,  22-26  August  1993. 

[2]  Avrirn  Blum,  Adam  Kalab  and  Hal  Wasserman.  Noise-tolerant  learning,  the  parity  problem, 
and  the  statistical  query  model.  In  Proceedings  of  the  Thirty-Second  An/c^/r/Z  ACA/ 5,(//??7;o.s'/i/r// 
on  Theory  of  Computing,  Portland,  Oregon.  21-23  May  2000. 

[3]  Amos  Fiat  and  Adi  Shamir.  How  to  prove  yourself:  Practical  solutions  to  identification  and 
signature  problems.  In  A.  M.  Odlyzko,  editor.  Advances  in  Cryptology — CRYPTO  '86,  volume 
263  of  Lecture  Notes  in  Computer  Science,  pages  186-194.  Springer- Verlag,  1987,  11-15  August 
1986. 

[4]  .Johan  Hastad.  Some  optimal  inapproximability  results.  In  Proceedings  of  the  7\venty-Ninth 
Annual  ACM  Symposium  on  Thce)ry  of  Computing,  pages  1-10,  El  Paso,  Texas,  4-6  May  1997. 

[5]  Leslie  Lamport.  Password  authentication  with  insecure  communication.  Communiceitierns  ejf 
the  ACM,  24(11),  November  1981. 

[6]  Tsutomu  Matsiimoto.  Human-computer  cryptography:  An  attempt.  In  Clifford  Neuman,  editor, 
3rd  ACM  Conference  em  Computer  and  Communiceitiems  Security,  pages  68-75,  New  Delhi, 
India,  March  1996.  ACM  Press. 

[7]  Tsutomu  Matsiimoto  and  Hideki  Imai.  Human  identification  through  insecure  channel.  In 
D.  W.  Da.vies,  editor,  Advances  in  Cryptology — EUROCRY PT  91,  volume  547  of  Lecture  Notes 
in  Computer  Science,  pages  409-421.  Springer- Verlag,  8-11  April  1991. 

[8]  R.  J.  McEliece.  A  public-key  cryptosystem  based  on  algebraic  coding  theory.  Technical  report. 
Jet  Propulsion  Laboratory,  1978.  Deep  Space  Network  Progress  Report. 

[9]  Chih-Hung  Wang,  Tzonelih  Hwang,  and  Jiun-Jang  Tsai.  On  the  Matsiimoto  and  Iniai’s  human 
identification  scheme.  In  Louis  C.  Guillou  and  Jean-Jacques  Quisquater,  editors,  Advances 
in  Cryptology — EUROCRYPT  95,  volume  921  of  Lecture  Notes  in  Cermputer  Science,  pages 
382-392.  Springer- Verlag,  21-25  May  1995. 


6 


